Project 1: Feature Detection and
Matching
By Kevin Yang (ky238@cornell.edu)
Note: I know the results from this project are very poor; I believe that there
is a problem with the MOPS descriptor. However, despite spending a significant
amount of time trying to find the error, I was still unsuccessful. I would
really appreciate it if one could point me to my mistake.
Feature
Descriptor
For my feature descriptor, I implemented a descriptor that is
based off of the Scale Invariant Feature Transform (SIFT) descriptor. This
version is simpler than the actual implementation in that it is an implementation
of the basic idea. For this descriptor, I created a histogram of the edge
orientations based upon the 16x16 window around the detected feature and filled
in the edge orientations into a histogram with 8 bins. However, unlike the SIFT
descriptor, I did not split up the window into 4x4
grids of cells, electing to keep all the angular data of the window in one
histogram.
Design
Choices
I chose to implement SIFT because of its scale and rotational
invariance, as well as being robust in a variety of situations. This has a
clear benefit in that this descriptor can be used in many real world situations
where taken images can vary greatly in illumination as well as scale and
rotation. However, given my time constraints, I was unable to implement the SIFT
operator in its full form, electing to implement a simplified version of it.
Performance
The performance of the program is not particularly good. The algorithms
clearly performed better on the Yosemite dataset in comparison to the Graf
dataset due to the fact that the image is a translation, which makes all the
descriptors used (simple, MOPS and SIFT) more effective in identifying the
correct match. The results are shown below.
Graf is a more difficult dataset due to the fact that the two images were
taken from different locations. This means that not only is it a translation,
but the image is also rotated and is of a different scale. In this set, the
simple descriptor really suffers as it is particularly robust to neither
rotation nor scalar changes due to the fact that it only takes a non-oriented
5x5 pixel area around a feature for comparison. The rather poor performance of
the MOPS descriptor is somewhat surprising; intuitively, it should be more
effective than the simple descriptor because it, unlike the simple descriptor
is oriented and does not rely on the pixel area directly around a feature
(rather, it subsamples a 41x41 region around the Harris feature). The ROC curve
for MOPS is rather unrealistic. The best performance, unsurprisingly, is seen
from the SIFT descriptor. Being designed to be both scale invariant as well as
rotation invariant, it is able to perform much better than the other two
descriptors. The results are shown below.
|
graf |
leuven |
bikes |
wall |
Average |
Simple
Descriptor |
|
|
|
|
|
SSD |
0.610545 |
0.430206 |
0.466590 |
0.303750 |
0.4527728 |
Ratio Test |
0.675835 |
0.647325 |
0.675835 |
0.637598 |
0.6591483 |
MOPS
Descriptor |
|
|
|
|
|
SSD |
0.110549 |
0.134677 |
0.365504 |
0.083914 |
0.173661 |
Ratio Test |
0.149928 |
0.180537 |
0.050388 |
0.099588 |
0.1201103 |
MySIFT Descriptor |
|
|
|
|
|
SSD |
0.621322 |
0.226715 |
0.345660 |
0.176709 |
0.342602 |
Ratio Test |
0.403039 |
0.299812 |
0.380727 |
0.169910 |
0.313372 |
Strengths
& Weaknesses
Some strengths of the baseline SIFT descriptor is its robustness even in difficult
situations (i.e. different rotation and scale). This explains the popularity of
the SIFT descriptor in the computer vision community. The MOPS descriptor
displayed surprisingly poor performance; it seems likely that is because of an
error in the implementation rather than in the algorithm itself. MySIFT surprisingly did not perform as well as the simple
descriptor. Perhaps the reason for this is because